Overview

This .rmd file uses a cleaned .rdata file as its starting point - in this case ‘roxbury_data_cleaned.rdata’, which has the entire dataframe of all sensors in Roxbury. Generally, the code does this:

  1. Loads packages
  2. Loads the .rdata file
  3. Filters by sensor or groups of sensors
  4. Sanity checking via data tables and time series
  5. Additional cleaning, as necessary
  6. Plotting functions and exploratory data analysis

Loading initial packages

# Check and install required packages if necessary
packages <- c("openair", "openairmaps", "leaflet", "dplyr", "chron", "timeDate", "data.table", "tidyr")
install.packages(packages[!sapply(packages, requireNamespace, quietly = TRUE)])

# Load required packages for data manipulation and analysis
invisible(sapply(packages, library, character.only = TRUE))
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
## 
## Attaching package: 'data.table'
## The following objects are masked from 'package:dplyr':
## 
##     between, first, last
# Set options
knitr::opts_chunk$set(echo = FALSE, message = FALSE)

AIR QUALITY DATA LOADING

Load Air Quality Data file.

Filter by the sensor(s) you want to analyze

CLEANING STEPS - already done for ‘roxbury_df_filtered.rdata’, but it’s here if you need to do more cleaning.

Define threshold values

Replace values above threshold defined above, filter values below zero

Summary statistics

##       pm1              pm25             pm10              co         
##  Min.   : 0.000   Min.   : 0.000   Min.   :  0.00   Min.   :  32.66  
##  1st Qu.: 2.460   1st Qu.: 3.200   1st Qu.:  7.70   1st Qu.: 233.50  
##  Median : 5.750   Median : 6.890   Median : 16.66   Median : 280.84  
##  Mean   : 7.805   Mean   : 9.165   Mean   : 22.91   Mean   : 341.33  
##  3rd Qu.:10.660   3rd Qu.:12.060   3rd Qu.: 29.49   3rd Qu.: 372.15  
##  Max.   :49.990   Max.   :99.980   Max.   :996.71   Max.   :4979.62  
##  NA's   :4008     NA's   :1414     NA's   :1113     NA's   :25084    
##        no               no2               o3        
##  Min.   :  1.080   Min.   :  1.00   Min.   :  0.00  
##  1st Qu.:  1.940   1st Qu.:  9.79   1st Qu.: 18.94  
##  Median :  2.490   Median : 20.01   Median : 24.86  
##  Mean   :  5.423   Mean   : 17.82   Mean   : 25.44  
##  3rd Qu.:  3.630   3rd Qu.: 24.71   3rd Qu.: 31.84  
##  Max.   :150.000   Max.   :374.92   Max.   :196.17  
##  NA's   :30042     NA's   :24443    NA's   :27512   
##  timestamp_local.x                    sn           
##  Min.   :2022-07-01 14:31:57.0   Length:422704     
##  1st Qu.:2022-09-22 12:34:00.0   Class :character  
##  Median :2023-02-24 20:55:18.0   Mode  :character  
##  Mean   :2023-01-25 03:29:09.7                     
##  3rd Qu.:2023-05-18 21:33:49.0                     
##  Max.   :2023-07-31 23:59:49.0                     
##                                                    
##    timestamp                          met.rh         met.temp         met.wd 
##  Min.   :2022-07-01 18:31:57.00   Min.   : 0.00   Min.   :-7.10   Min.   :0  
##  1st Qu.:2022-09-22 16:34:00.00   1st Qu.:42.40   1st Qu.:11.30   1st Qu.:0  
##  Median :2023-02-25 01:55:18.00   Median :54.70   Median :18.60   Median :0  
##  Mean   :2023-01-25 07:38:23.14   Mean   :56.48   Mean   :18.03   Mean   :0  
##  3rd Qu.:2023-05-19 01:33:49.00   3rd Qu.:72.90   3rd Qu.:25.50   3rd Qu.:0  
##  Max.   :2023-08-01 03:59:49.00   Max.   :99.20   Max.   :38.00   Max.   :0  
##                                                                              
##      met.ws     met.xrh         met.xtemp           bin0        
##  Min.   :0   Min.   : NA      Min.   : NA      Min.   :  0.000  
##  1st Qu.:0   1st Qu.: NA      1st Qu.: NA      1st Qu.:  2.449  
##  Median :0   Median : NA      Median : NA      Median :  4.996  
##  Mean   :0   Mean   :NaN      Mean   :NaN      Mean   :  8.319  
##  3rd Qu.:0   3rd Qu.: NA      3rd Qu.: NA      3rd Qu.: 10.612  
##  Max.   :0   Max.   : NA      Max.   : NA      Max.   :147.231  
##              NA's   :422704   NA's   :422704                    
##       bin1               bin2              bin3               bin4        
##  Min.   :  0.0000   Min.   : 0.0000   Min.   : 0.00000   Min.   : 0.0000  
##  1st Qu.:  0.2771   1st Qu.: 0.0875   1st Qu.: 0.02010   1st Qu.: 0.0237  
##  Median :  0.5744   Median : 0.1797   Median : 0.04380   Median : 0.0506  
##  Mean   :  1.2059   Mean   : 0.3643   Mean   : 0.09701   Mean   : 0.1136  
##  3rd Qu.:  1.1945   3rd Qu.: 0.3669   3rd Qu.: 0.09470   3rd Qu.: 0.1099  
##  Max.   :102.4518   Max.   :77.3930   Max.   :39.10640   Max.   :56.8451  
##                                                                           
##       bin5               bin6              bin7              bin8       
##  Min.   : 0.00000   Min.   :0.00000   Min.   :0.00000   Min.   :0.0000  
##  1st Qu.: 0.01630   1st Qu.:0.01100   1st Qu.:0.00350   1st Qu.:0.0029  
##  Median : 0.03770   Median :0.02600   Median :0.01010   Median :0.0080  
##  Mean   : 0.08477   Mean   :0.04907   Mean   :0.01695   Mean   :0.0145  
##  3rd Qu.: 0.08170   3rd Qu.:0.05170   3rd Qu.:0.02100   3rd Qu.:0.0175  
##  Max.   :17.70360   Max.   :7.40880   Max.   :3.81450   Max.   :4.0310  
##                                                                         
##       bin9              bin10              bin11              bin12          
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.000000   Min.   :0.0000000  
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.0000000  
##  Median :0.003200   Median :0.000000   Median :0.000000   Median :0.0000000  
##  Mean   :0.006269   Mean   :0.002379   Mean   :0.001089   Mean   :0.0004622  
##  3rd Qu.:0.007700   3rd Qu.:0.003400   3rd Qu.:0.000000   3rd Qu.:0.0000000  
##  Max.   :2.433200   Max.   :1.392600   Max.   :0.759800   Max.   :0.4466000  
##                                                                              
##      bin13               bin14               bin15          
##  Min.   :0.0000000   Min.   :0.000e+00   Min.   :0.000e+00  
##  1st Qu.:0.0000000   1st Qu.:0.000e+00   1st Qu.:0.000e+00  
##  Median :0.0000000   Median :0.000e+00   Median :0.000e+00  
##  Mean   :0.0001962   Mean   :9.647e-05   Mean   :5.664e-05  
##  3rd Qu.:0.0000000   3rd Qu.:0.000e+00   3rd Qu.:0.000e+00  
##  Max.   :0.1687000   Max.   :1.169e-01   Max.   :1.044e-01  
##                                                             
##      bin16               bin17               pm1num             lat       
##  Min.   :0.000e+00   Min.   :0.000e+00   Min.   :  0.000   Min.   :42.33  
##  1st Qu.:0.000e+00   1st Qu.:0.000e+00   1st Qu.:  2.918   1st Qu.:42.33  
##  Median :0.000e+00   Median :0.000e+00   Median :  5.859   Median :42.33  
##  Mean   :3.784e-05   Mean   :2.634e-05   Mean   :  9.889   Mean   :42.33  
##  3rd Qu.:0.000e+00   3rd Qu.:0.000e+00   3rd Qu.: 12.185   3rd Qu.:42.33  
##  Max.   :7.370e-02   Max.   :8.020e-02   Max.   :240.690   Max.   :42.33  
##                                                                           
##       lon           sitename         mod_date_1min                   
##  Min.   :-71.09   Length:422704      Min.   :2022-07-01 14:32:00.00  
##  1st Qu.:-71.09   Class :character   1st Qu.:2022-09-22 12:33:45.00  
##  Median :-71.09   Mode  :character   Median :2023-02-24 20:55:30.00  
##  Mean   :-71.09                      Mean   :2023-01-25 02:38:26.23  
##  3rd Qu.:-71.09                      3rd Qu.:2023-05-18 21:34:15.00  
##  Max.   :-71.09                      Max.   :2023-08-01 00:00:00.00  
##                                                                      
##  original_met_time       tmpc             wd            ws       
##  Length:422704      Min.   :-8.89   Min.   :  0   Min.   :0.000  
##  Class :character   1st Qu.: 9.44   1st Qu.:110   1st Qu.:3.084  
##  Mode  :character   Median :16.11   Median :210   Median :4.626  
##                     Mean   :15.80   Mean   :194   Mean   :4.580  
##                     3rd Qu.:22.22   3rd Qu.:280   3rd Qu.:5.654  
##                     Max.   :37.22   Max.   :360   Max.   :9.766  
##                                                   NA's   :5612   
##  timestamp_local.y       date                       
##  Length:422704      Min.   :2022-07-01 14:31:57.00  
##  Class :character   1st Qu.:2022-09-22 12:34:00.00  
##  Mode  :character   Median :2023-02-24 20:55:18.00  
##                     Mean   :2023-01-25 02:38:23.14  
##                     3rd Qu.:2023-05-18 21:33:49.00  
##                     Max.   :2023-07-31 23:59:49.00  
## 

Date formatting

Time series

EXPLORATORY DATA ANALYSIS

Temporal variability: Diurnal and annual/seasonal profiles

Directional analysis of pollutants

Create polar plots (and other things in that family)

## Warning: ✖ `statistic == 'frequency'` incompatible with a defined pollutant.
## ℹ Setting statistic to `'mean'`.

## # A tibble: 4 × 5
##   cluster mean_pm1      n n_percent pm1_percent
##   <chr>      <dbl>  <int>     <dbl>       <dbl>
## 1 C1          8.06  76985      18.6        19.1
## 2 C2          5.35  39756       9.6         6.5
## 3 C3          9.38 225557      54.6        65.1
## 4 C4          4.28  70831      17.1         9.3

## Warning: ✖ `statistic == 'frequency'` incompatible with a defined pollutant.
## ℹ Setting statistic to `'mean'`.

## # A tibble: 4 × 5
##   cluster mean_pm25      n n_percent pm25_percent
##   <chr>       <dbl>  <int>     <dbl>        <dbl>
## 1 C1           8.46  92986      22.4         20.5
## 2 C2          11.2  182414      43.9         53.3
## 3 C3           8.81  83186      20           19.1
## 4 C4           4.77  57137      13.7          7.1

## Warning: ✖ `statistic == 'frequency'` incompatible with a defined pollutant.
## ℹ Setting statistic to `'mean'`.

## # A tibble: 4 × 5
##   cluster mean_pm10      n n_percent pm10_percent
##   <chr>       <dbl>  <int>     <dbl>        <dbl>
## 1 C1           21.8 122907      29.5         28.1
## 2 C2           27.8 160830      38.7         47  
## 3 C3           47.3  18559       4.5          9.2
## 4 C4           13.1 113728      27.3         15.7

## Warning: ✖ `statistic == 'frequency'` incompatible with a defined pollutant.
## ℹ Setting statistic to `'mean'`.

## # A tibble: 4 × 5
##   cluster mean_co      n n_percent co_percent
##   <chr>     <dbl>  <int>     <dbl>      <dbl>
## 1 C1         386. 175444      44.7       50.5
## 2 C2         341.  84702      21.6       21.5
## 3 C3         291.  92873      23.7       20.2
## 4 C4         270.  39042      10          7.8

## Warning: ✖ `statistic == 'frequency'` incompatible with a defined pollutant.
## ℹ Setting statistic to `'mean'`.

## # A tibble: 4 × 5
##   cluster mean_o3      n n_percent o3_percent
##   <chr>     <dbl>  <int>     <dbl>      <dbl>
## 1 C1         27.9  95890      24.6       27.1
## 2 C2         28.6  56828      14.6       16.5
## 3 C3         28.8  72705      18.7       21.2
## 4 C4         21.2 164232      42.1       35.3

## Warning: ✖ `statistic == 'frequency'` incompatible with a defined pollutant.
## ℹ Setting statistic to `'mean'`.

## # A tibble: 4 × 5
##   cluster mean_no      n n_percent no_percent
##   <chr>     <dbl>  <int>     <dbl>      <dbl>
## 1 C1         6.49 198370      51.2       61.2
## 2 C2         4.52 103146      26.6       22.1
## 3 C3         3.36  43626      11.3        7  
## 4 C4         4.86  42052      10.9        9.7

## Warning: ✖ `statistic == 'frequency'` incompatible with a defined pollutant.
## ℹ Setting statistic to `'mean'`.

## # A tibble: 4 × 5
##   cluster mean_no2      n n_percent no2_percent
##   <chr>      <dbl>  <int>     <dbl>       <dbl>
## 1 C1          14.6  35976       9.2         7.5
## 2 C2          17.5 161836      41.2        40.4
## 3 C3          16.8  98355      25          23.6
## 4 C4          20.7  96511      24.6        28.5

Create Polar map plots